Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting

نویسندگان

Anders Øland

Aayush Bansal

Roger B. Dannenberg

Bhiksha Raj

چکیده

In this work, we show that saturating output activation functions, such as the softmax, impede learning on a number of standard classification tasks. Moreover, we present results showing that the utility of softmax does not stem from the normalization, as some have speculated. In fact, the normalization makes things worse. Rather, the advantage is in the exponentiation of error gradients. This exponential gradient boosting is shown to speed up convergence and improve generalization. To this end, we demonstrate faster convergence and better performance on diverse classification tasks: image classification using CIFAR-10 and ImageNet, and semantic segmentation using PASCAL VOC 2012. In the latter case, using the stateof-the-art neural network architecture, the model converged 33% faster with our method than with the standard softmax activation, and that with a slightly better performance to boot.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Wavelet-based gradient boosting

A new data science tool named wavelet-based gradient boosting is proposed and tested. The approach is special case of componentwise linear least squares gradient boosting, and involves wavelet functions of the original predictors.Wavelet-based gradient boosting takes advantages of the approximate 1 penalization induced by gradient boosting to give appropriate penalized additive fits. The method...

متن کامل

Linear and Nonlinear Trading Models with Gradient Boosted Random Forests and Application to Singapore Stock Market

This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allo...

متن کامل

Predictive Risk Mapping of Leptospirosis for North of Iran Using Pseudo-absences Data

Leptospirosis is a common zoonosis disease with a high prevalence in the world and is recognized as an important public health drawback in both developing and developed countries owing to epidemics and increasing prevalence. Because of the high diversity of hosts that are capable of carrying the causative agent, this disease has an expansive geographical reach. Various environmental and social ...

متن کامل

Iclr 2017 C Ategorical R Eparameterization with G Umbel - S Oftmax

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we present an efficient gradient estimator that replaces the non-differentiable sample from a categorical distribution with a differentiable sample from a nove...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1707.04199 شماره

صفحات -

تاریخ انتشار 2017

Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting

نویسندگان

چکیده

منابع مشابه

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Wavelet-based gradient boosting

Linear and Nonlinear Trading Models with Gradient Boosted Random Forests and Application to Singapore Stock Market

Predictive Risk Mapping of Leptospirosis for North of Iran Using Pseudo-absences Data

Iclr 2017 C Ategorical R Eparameterization with G Umbel - S Oftmax

عنوان ژورنال:

اشتراک گذاری